Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Source-postgres] Set default cursor value for cdc mode #27442

Merged
merged 8 commits into from
Jul 13, 2023

Conversation

nguyenaiden
Copy link
Contributor

@nguyenaiden nguyenaiden commented Jun 16, 2023

What

26807
To prepare for Destination v2, Cdc-enabled DB sources need to have a pre-defined cursor for normalization to operate properly.

How

During the discover phase, source_defined_cursor is set to true, and _ab_cdc_lsn is selected as the predefined cursor. NOTE: This value could change if we believe it's not unique.
image

AirbyteStateMessage:

{
  "cdc": true,
  "cdc_state": {
    "state": {
      "[\"postgres\",{\"server\":\"postgres\"}]": "{\"transaction_id\":null,\"lsn\":840386496,\"txId\":1895971,\"ts_usec\":1686867930174234}"
    }
  },
  "streams": [
    {
      "stream_name": "blah",
      "stream_namespace": "public",
      "cursor_field": [
        "_ab_cdc_lsn"
      ]
    },
    {
      "stream_name": "jsonb_test_norm",
      "stream_namespace": "public",
      "cursor_field": [
        "_ab_cdc_lsn"
      ]
    }
  ]
}

Recommended reading order

  1. PostgresCatalogHelper.java
  2. PostgresSource.java
  3. All the test files

🚨 User Impact 🚨

New CDC syncs that have refreshed their source schema will see this cursor field be chosen. Customers who reset their data will continue to see old behavior until their source goes through the discover() phase again.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 16, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan and you've followed all steps in the Breaking Changes Checklist
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • The connector tests are passing in CI
  • You've updated the connector's metadata.yaml file (new!)
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 16, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (3)

Connector Version Changelog Publish
source-alloydb 2.0.28
source-alloydb-strict-encrypt 2.0.28 🔵
(ignored)
🔵
(ignored)
source-postgres-strict-encrypt 2.0.33 🔵
(ignored)
🔵
(ignored)
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the cloud or oss registry, so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that you have added a metadata.yaml file and the expected registries are enabled.

@octavia-squidington-iii
Copy link
Collaborator

source-alloydb-strict-encrypt test report (commit fa057b48c9) - ❌

⏲️ Total pipeline duration: 369 seconds

Step Result
Validate airbyte-integrations/connectors/source-alloydb-strict-encrypt/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-alloydb-strict-encrypt docker image for platform linux/x86_64
Unit tests
Integration tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-alloydb-strict-encrypt test

@github-actions
Copy link
Contributor

github-actions bot commented Jun 16, 2023

Coverage report for source-postgres

File Coverage [89.6%] 🍏
PostgresCatalogHelper.java 100% 🍏
PostgresSource.java 88.5% 🍏
Total Project Coverage 69.94% 🍏

@octavia-squidington-iii
Copy link
Collaborator

source-postgres-strict-encrypt test report (commit fa057b48c9) - ❌

⏲️ Total pipeline duration: 610 seconds

Step Result
Validate airbyte-integrations/connectors/source-postgres-strict-encrypt/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-postgres-strict-encrypt docker image for platform linux/x86_64
Unit tests
Integration tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres-strict-encrypt test

@octavia-squidington-iii
Copy link
Collaborator

source-mssql test report (commit fa057b48c9) - ❌

⏲️ Total pipeline duration: 2163 seconds

Step Result
Validate airbyte-integrations/connectors/source-mssql/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-mssql docker image for platform linux/x86_64
Unit tests
Integration tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mssql test

@octavia-squidington-iii
Copy link
Collaborator

source-alloydb test report (commit fa057b48c9) - ❌

⏲️ Total pipeline duration: 321 seconds

Step Result
Validate airbyte-integrations/connectors/source-alloydb/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-alloydb docker image for platform linux/x86_64
Unit tests
Integration tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-alloydb test

@octavia-squidington-iii
Copy link
Collaborator

source-alloydb-strict-encrypt test report (commit 48b836dd39) - ❌

⏲️ Total pipeline duration: 325 seconds

Step Result
Validate airbyte-integrations/connectors/source-alloydb-strict-encrypt/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-alloydb-strict-encrypt docker image for platform linux/x86_64
Unit tests
Integration tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-alloydb-strict-encrypt test

@octavia-squidington-iii
Copy link
Collaborator

source-mssql test report (commit 48b836dd39) - ❌

⏲️ Total pipeline duration: 2162 seconds

Step Result
Validate airbyte-integrations/connectors/source-mssql/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-mssql docker image for platform linux/x86_64
Unit tests
Integration tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mssql test

@octavia-squidington-iii
Copy link
Collaborator

source-mysql-strict-encrypt test report (commit 48b836dd39) - ❌

⏲️ Total pipeline duration: 1796 seconds

Step Result
Validate airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-mysql-strict-encrypt docker image for platform linux/x86_64
Unit tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql-strict-encrypt test

@octavia-squidington-iii
Copy link
Collaborator

source-postgres test report (commit 48b836dd39) - ❌

⏲️ Total pipeline duration: 1247 seconds

Step Result
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check.
Connector version increment check.
QA checks
Build connector tar
Build source-postgres docker image for platform linux/x86_64
Unit tests
Integration tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test

Copy link
Contributor

@prateekmukhedkar prateekmukhedkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to confirm, _ab_cdc_lsn field and associated value is already present in the data that we serialize, and that is why we don't need to add this field as we reading data?

@nguyenaiden
Copy link
Contributor Author

@prateekmukhedkar yes, we are currently emitting the lsn value in the data we serialize so we don't need to add a new column.

@edgao
Copy link
Contributor

edgao commented Jun 23, 2023

running legacy normalization tests in #27670 (comment). Will approve once those look good.

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are green on #27670. :shipit:

thanks for putting this together so quickly!

Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis looks fine. I just want you to carry out few tests

  1. Create a postgres source cdc connector on an old version with a destination that supports normalization (snowflake), run a sync and then change image version of the connector to use changes from this PR and then run a sync and make sure everything works.
  2. Do everything as mentioned in step 1 but this time refresh the schema as well.
  3. Create a connector on this new version and then roll back to the old version and see what happens (both with and without refresh schema)

@prateekmukhedkar prateekmukhedkar added the breaking-change Don't merge me unless you are ready. label Jul 12, 2023
@nguyenaiden nguyenaiden force-pushed the duy/cdc-source-defined-cursor branch from 48b836d to 052a5ae Compare July 12, 2023 18:13
@octavia-squidington-iii
Copy link
Collaborator

source-postgres test report (commit 052a5aee9d) - ❌

⏲️ Total pipeline duration: 10mn18s

Step Result
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-postgres docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-postgres:integrationTest
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test

@octavia-squidington-iii
Copy link
Collaborator

source-mysql test report (commit 052a5aee9d) - ❌

⏲️ Total pipeline duration: 01mn29s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@octavia-squidington-iii
Copy link
Collaborator

source-postgres-strict-encrypt test report (commit 052a5aee9d) - ❌

⏲️ Total pipeline duration: 05mn43s

Step Result
Validate airbyte-integrations/connectors/source-postgres-strict-encrypt/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-postgres-strict-encrypt docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-postgres-strict-encrypt:integrationTest
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres-strict-encrypt test

@octavia-squidington-iii
Copy link
Collaborator

source-mysql-strict-encrypt test report (commit 052a5aee9d) - ❌

⏲️ Total pipeline duration: 12mn30s

Step Result
Validate airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-mysql-strict-encrypt docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql-strict-encrypt:integrationTest
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql-strict-encrypt test

@octavia-squidington-iii
Copy link
Collaborator

source-alloydb-strict-encrypt test report (commit 052a5aee9d) - ❌

⏲️ Total pipeline duration: 04mn18s

Step Result
Validate airbyte-integrations/connectors/source-alloydb-strict-encrypt/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-alloydb-strict-encrypt docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-alloydb-strict-encrypt:integrationTest
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-alloydb-strict-encrypt test

@nguyenaiden nguyenaiden force-pushed the duy/cdc-source-defined-cursor branch from b661b1f to 54f4a11 Compare July 12, 2023 19:05
@nguyenaiden
Copy link
Contributor Author

nguyenaiden commented Jul 12, 2023

/legacy-test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/5536228889
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/5536228889
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:615: The previous and actual discovered catalogs are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:624: Backward compatibility tests are disabled for version 2.1.1.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:712: This tests currently leads to too much failures. We need to fix the connectors at scale first.
=================== 70 passed, 7 skipped in 83.48s (0:01:23) ===================

@nguyenaiden
Copy link
Contributor Author

nguyenaiden commented Jul 12, 2023

/legacy-test connector=connectors/source-postgres-strict-encrypt

🕑 connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/5536256927
✅ connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/5536256927
No Python unittests run

Build Passed

Test summary info:

All Passed

@nguyenaiden
Copy link
Contributor Author

nguyenaiden commented Jul 12, 2023

/legacy-test connector=connectors/source-alloydb

🕑 connectors/source-alloydb https://github.com/airbytehq/airbyte/actions/runs/5536261878
✅ connectors/source-alloydb https://github.com/airbytehq/airbyte/actions/runs/5536261878
No Python unittests run

Build Passed

Test summary info:

All Passed

@nguyenaiden
Copy link
Contributor Author

nguyenaiden commented Jul 12, 2023

/test connector=connectors/source-alloydb-strict-encrypt

⚠️ The test slash command is now deprecated.

The connector tests are automatically triggered as CI checks.

Please use /legacy-test if you need to test CDK or CAT changes.

Please join post to #pipeline-complaint-hotline slack channel if something is not working as expected.

@nguyenaiden
Copy link
Contributor Author

nguyenaiden commented Jul 12, 2023

/legacy-test connector=connectors/source-alloydb-strict-encrypt

🕑 connectors/source-alloydb-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/5536628041
✅ connectors/source-alloydb-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/5536628041
No Python unittests run

Build Passed

Test summary info:

All Passed

Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review much on the tests side, leaving that to the sources team.

From ops side looks good, seeing breaking changes checks correctly indicated as skipped in the test logs for the affected version:

SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:624: Backward compatibility tests are disabled for version 2.1.1.

Should make sure acceptance tests are passing for the connectors to be published - it looks like they are.

Build failure for destination-snowflake is also failing on master and this PR does not touch it, so should be ok to release.

Comment on lines 277 to +278
.map(PostgresCatalogHelper::setIncrementalToSourceDefined)
.map(PostgresCatalogHelper::setDefaultCursorFieldForCdc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 only for cdc, we are setting the default cursor.

@nguyenaiden
Copy link
Contributor Author

/approve-and-merge reason="All CI steps already passed. And manual tests done to verify. Author: Duy Nguyen, copilot=Ed Gao"

@octavia-approvington
Copy link
Contributor

Our work here is done
done

@octavia-approvington octavia-approvington merged commit 633c939 into master Jul 13, 2023
@octavia-approvington octavia-approvington deleted the duy/cdc-source-defined-cursor branch July 13, 2023 16:12
@octavia-squidington-iii
Copy link
Collaborator

source-mysql-strict-encrypt test report (commit 5c2e4796ab) - ❌

⏲️ Total pipeline duration: 12mn58s

Step Result
Validate airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-mysql-strict-encrypt docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql-strict-encrypt:integrationTest
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql-strict-encrypt test

efimmatytsin pushed a commit to scentbird/airbyte that referenced this pull request Jul 27, 2023
)

* use LSN as default cursor for postgres CDC

* Fixed static constant

* Set lsn default cursor value for postgres sync

* Bumped metadata and dockerfile versions

* Disable acceptance backwards compatibility discovery test as this is a breaking change

---------

Co-authored-by: Conor <cpdeethree@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants